Today we will learn

  1. Interaction Effects with two Continuous Variables
  2. Prediction So Far
  3. Meet the apply function
  4. Simulation
  • Expected Values
  • First Differences
  1. Predicted Values

In other words, the goals are to:

  • use simulation to make statements about uncertainty of predicted/expected values
  • calculate quantities of interest

This is a very important lab session. You can benefit a lot if you work through it step by step at home and read the King, Tomz, and Wittenberg (2000) article again.

But first we have another look at interaction effects.


# The first line sets an option for the final document that can be produced from
# the .Rmd file. Don't worry about it.
knitr::opts_chunk$set(
  echo = TRUE,
  attr.output = 'style="max-height: 200px;"'
  # collapse = TRUE
)

# The next bit is quite powerful and useful.
# First you define which packages you need for your analysis and assign it to
# the p_needed object.
p_needed <-
  c("viridis", # we will use magma palette this time 
    "dplyr", # for preprocessing 
    "broom", # for tidy model output 
    "dagitty", # for the DAG in appendix
    "ggplot2",
    "scales",
    "equatiomatic", # to extract regression equations from models 
    "MASS" # to draw from the multivariate normal distribution
    )

# Now you check which packages are already installed on your computer.
# The function installed.packages() returns a vector with all the installed
# packages.
packages <- rownames(installed.packages())

# Then you check which of the packages you need are not installed on your
# computer yet. Essentially you compare the vector p_needed with the vector
# packages. The result of this comparison is assigned to p_to_install.
p_to_install <- p_needed[!(p_needed %in% packages)]
# If at least one element is in p_to_install you then install those missing
# packages.
if (length(p_to_install) > 0) {
  install.packages(p_to_install)
}

# Now that all packages are installed on the computer, you can load them for
# this project. Additionally the expression returns whether the packages were
# successfully loaded.
sapply(p_needed, require, character.only = TRUE)
## Loading required package: viridis
## Loading required package: viridisLite
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: broom
## Loading required package: dagitty
## Loading required package: ggplot2
## Loading required package: scales
## 
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
## 
##     viridis_pal
## Loading required package: equatiomatic
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
##      viridis        dplyr        broom      dagitty      ggplot2       scales 
##         TRUE         TRUE         TRUE         TRUE         TRUE         TRUE 
## equatiomatic         MASS 
##         TRUE         TRUE
# This is an option for stargazer tables
# It automatically adapts the output to html or latex,
# depending on whether we want a html or pdf file
stargazer_opt <- ifelse(knitr::is_latex_output(), "latex", "html")

1 Understanding Marginal Effects for Categorical and Continuous Moderators

So far, we have only looked at interactions between a continuous predictor and a categorical moderating variable. We can, however, easily generalize this idea to continuous-by-continuous interactions.

The data frame ia_data contains four variables: An outcome variable y, a continuous predictor x, and two versions of a moderating variable: The continuous z_con and the binary/categorical z_cat.

Below, we run two models of the form \(\hat{y} = \beta_1 + \beta_2 x + \beta_3 z + \beta_4 x z\), using the continuous and categorical moderators, respectively.

## Load data
load("raw-data/ia_data.RData")

## Model with continuous moderator
mod_con <- lm(y ~ x * z_con, data = ia_data)
summary(mod_con)
## 
## Call:
## lm(formula = y ~ x * z_con, data = ia_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.954  -7.148   0.085   7.086  35.538 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8324     0.8355   3.390 0.000713 ***
## x             2.6908     0.3768   7.141 1.29e-12 ***
## z_con         0.8038     0.3252   2.472 0.013534 *  
## x:z_con       1.0975     0.1469   7.473 1.16e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.29 on 1996 degrees of freedom
## Multiple R-squared:  0.3169, Adjusted R-squared:  0.3158 
## F-statistic: 308.6 on 3 and 1996 DF,  p-value: < 2.2e-16
## Model with categorical moderator
mod_cat <- lm(y ~ x * z_cat, data = ia_data)
summary(mod_cat)
## 
## Call:
## lm(formula = y ~ x * z_cat, data = ia_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.984  -7.207   0.014   7.344  39.591 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.1945     0.7625   4.189 2.92e-05 ***
## x             3.6484     0.3423  10.658  < 2e-16 ***
## z_cat         2.3655     1.0494   2.254   0.0243 *  
## x:z_cat       2.5030     0.4725   5.298 1.30e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.73 on 1996 degrees of freedom
## Multiple R-squared:  0.2578, Adjusted R-squared:  0.2567 
## F-statistic: 231.1 on 3 and 1996 DF,  p-value: < 2.2e-16

1.1 Calculate linear predictions

Next, we want to calculate the linear prediction, \(\hat{y}\), as a function of \(x\) at different values of the moderating variable \(z\). (Please note that this does not mean that \(z\) is a mediator in a causal graph. Here the term moderating variable is used to say that the effect of \(x\) on \(y\) is influenced by \(z\).) When using z_cat, this is easy. As before, we calculate values for two lines: One when z_val_cat = 0 and one when z_val_cat = 1. When using the continuous moderator z_cont things are a bit more intricate. We need to select a range of values of z_cont at which we want to calculate the relationship between \(y\) and \(x\). Here, we use the following values:

  • minimum
  • 1st percentile
  • 1st decile
  • 1st quartile
  • the median
  • 3rd quartile
  • 9th decile
  • 99th percentile
  • maximum
## function for the linear prediction
pred_lm <- function (b, x, z) {
  preds <- matrix(NA, length(x), length(z))
  for (i in seq_len(length(z))) {
    preds[, i] <- b[1] + b[2] * x + b[3] * z[i] + b[4] * x * z[i]
  }
  return(preds)
}

## use the function for the continuous model
b_con <- coef(mod_con)
x_vals <- seq(min(ia_data$x), max(ia_data$x), length.out = 101)
z_vals_con <- quantile(ia_data$z_con, 
                       c(0, .01, .1, .25, .5, .75, .9, .99, 1))
preds_con <- pred_lm(b_con, x_vals, z_vals_con)

## ..and for the categorical model
b_cat <- coef(mod_cat)
z_vals_cat <- c(0, 1)
preds_cat <- pred_lm(b_cat, x_vals, z_vals_cat)

1.2 Calculate marginal effects

Next, we want to calculate the marginal effect of each of the predicted lines. As we know from the lecture (Week 06, slide 27), if we have a regression equation of the form \(y = \beta_1 + \beta_2 x + \beta_3 z + \beta_4 x z + \epsilon\), then the marginal effect of \(x\) can be obtained by taking the partial derivative with respect to \(x\):

\[ \underbrace{\frac{\partial y}{\partial x}}_{\text{Notation for partial derivative}} = \underbrace{\beta_2 + \beta_4 z}_{\text{Marginal effect of x}} \]

## function for the marginal effect
mfx_lm <- function (b, z) {
  b[2] + b[4] * z
}

## use the function for the continuous model
mfx_con <- mfx_lm(b_con, z_vals_con)

## use the function for the continuous model
mfx_cat <- mfx_lm(b_cat, z_vals_cat)

1.3 Calculate standard errors of estimated marginal effects

Lastly, we also want to calculate the standard errors for our estimated marginal effects of \(x\). Per the lecture slides (Week 06, slide 26), we know the formula is

\[ \hat{\sigma}_{\frac{\partial y}{\partial x}} = \sqrt{\text{var}(\hat{\beta_2}) + z^2 \text{var}(\hat{\beta_4}) + 2 z \text{cov}(\hat{\beta_2}, \hat{\beta_4})} \]

## function for the standard error of the marginal effect
se_mfx <- function (vcov, z) {
  sqrt(vcov[2, 2] + z ^ 2 * vcov[4, 4] + 2 * z * vcov[4, 2])
}

## use the function for the continuous model
vcov_con <- vcov(mod_con)
se_mfx_con <- se_mfx(vcov_con, z_vals_con)

## use the function for the continuous model
vcov_cat <- vcov(mod_cat)
se_mfx_cat <- se_mfx(vcov_cat, z_vals_cat)

1.4 Plot the results

Using our stored objects for

  • the linear predictions of \(y\) as a function of \(x\) at the specified values of \(z\).
  • the marginal effects \(\frac{\partial y}{\partial x}\) at the specified values of \(z\).
  • and the corresponding standard errors \(\hat{\sigma}_{\frac{\partial y}{\partial x}}\).

we can now plot the linear predictions and the marginal effects (with confidence intervals) for both scenarios of categorical and continuous moderation side-by-side:

## set up a 2-by-2 plot (and adjust plot margins)
par(mfrow = c(2, 2),
    mar = c(5.1, 6.1, 4.1, 2.1))

## Plot 1: Linear Prediction (Categorical)
col_vec <- viridis(length(se_mfx_cat))
plot(
  x = ia_data$x,
  y = ia_data$y,
  pch = 16,
  xlab = "x",
  ylab = "y",
  type = "n",
  bty = "n",
  las = 1,
  main = "Linear Prediction (Categorical)",
  bty = "n"
)

for (i in seq_len(ncol(preds_cat))) {
  lines(
    x = x_vals,
    y = preds_cat[, i],
    lty = i,
    col = col_vec[i]
  )
}

## Plot 2: Marginal Effect (Categorical)
col_vec <- viridis(length(se_mfx_cat))
plot (
  x = z_vals_cat,
  y = mfx_cat,
  pch = 16,
  xlab = "z",
  ylab = "", # added manually later 
  type = "n",
  bty = "n",
  las = 1,
  main = "Marginal Effect (Categorical)",
  xlim = c(-1, 2),
  ylim = c(-2, 12),
  axes = F,
  bty = "n"
)

abline(h = 0, col = 'gray60', lwd = .5)
axis(1,
     at = z_vals_cat)
axis(2, las = 1)

# add the label for y-axis separately, horizontally 
text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -1.6, y = 6)

for (i in 1:length(se_mfx_cat)) {
  points(
    x = z_vals_cat[i],
    y = mfx_cat[i],
    pch = 16,
    col = col_vec[i]
  )
}
for (i in 1:length(se_mfx_cat)) {
  segments(
    z_vals_cat[i],
    mfx_cat[i] + qnorm(.025) * se_mfx_cat[i],
    z_vals_cat[i],
    mfx_cat[i] + qnorm(.975) * se_mfx_cat[i],
    col = col_vec[i]
  )
}

## Plot 3: Linear Prediction (Continuous)
col_vec <- viridis(length(se_mfx_con))
plot (
  x = ia_data$x,
  y = ia_data$y,
  pch = 16,
  xlab = "x",
  ylab = "y",
  type = "n",
  bty = "n",
  las = 1,
  main = "Linear Prediction (Continuous)",
  bty = "n"
)

for (i in 1:ncol(preds_con)) {
  lines(
    x = x_vals,
    y = preds_con[, i],
    lty = i,
    col = col_vec[i]
  )
}

## Plot 4: Marginal Effect (Continuous)
col_vec <- viridis(length(se_mfx_con))
plot (
  x = z_vals_con,
  y = mfx_con,
  pch = 16,
  xlab = "z",
  ylab = "",
  type = "n",
  bty = "n",
  axes = F,
  main = "Marginal Effect (Continuous)",
  xlim = c(-4, 8),
  ylim = c(-3, 13),
  bty = "n"
)
axis(1)
axis(2,
     las = 1,
     at = seq(-2,12, by = 2))

abline(h = 0, col = 'gray60', lwd = .5)

text(bquote(frac(partialdiff ~ y, partialdiff ~ x)), xpd = TRUE, x = -6.2, y = 6)

for (i in 1:length(se_mfx_con)) {
  points(
    x = z_vals_con[i],
    y = mfx_con[i],
    pch = 16,
    col = col_vec[i]
  )
}
for (i in 1:length(se_mfx_con)) {
  segments(
    z_vals_con[i],
    mfx_con[i] + qnorm(.025) * se_mfx_con[i],
    z_vals_con[i],
    mfx_con[i] + qnorm(.975) * se_mfx_con[i],
    col = col_vec[i]
  )
}

## Add contiguous lines for mfx and se's to the last plot
## Compute first...
z_vals_fine <-
  seq(min(ia_data$z_con), max(ia_data$z_con), length.out = 101)
mfx_fine <- mfx_lm(b_con, z_vals_fine)
se_mfx_fine <- se_mfx(vcov_con, z_vals_fine)

## ... then plot
lines(z_vals_fine, mfx_fine, col = adjustcolor("black", alpha = 0.5))
lines(
  z_vals_fine,
  mfx_fine + qnorm(.025) * se_mfx_fine,
  lty = 2,
  col = adjustcolor("black", alpha = 0.5)
)
lines(
  z_vals_fine,
  mfx_fine + qnorm(.975) * se_mfx_fine,
  lty = 2,
  col = adjustcolor("black", alpha = 0.5)
)

1.5 Review: Prediction So Far

We use the 2013 election data set again.

load(file = "raw-data/election2013_2.RData")

df <- as.data.frame(election2013_2)

Similar to the homework, we regress leftvote on unemployment and east. Additionally, we include a multiplicative interaction term unemployment*east.

reg <- lm(leftvote ~ unemployment + east + 
            unemployment*east, 
          data = df)
summary(reg)
## 
## Call:
## lm(formula = leftvote ~ unemployment + east + unemployment * 
##     east, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8438 -0.8513 -0.1683  0.5337 11.4562 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.26144    0.30817   7.338 2.11e-12 ***
## unemployment       0.54859    0.04739  11.576  < 2e-16 ***
## east              16.10218    1.43807  11.197  < 2e-16 ***
## unemployment:east -0.13650    0.14407  -0.947    0.344    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.874 on 295 degrees of freedom
## Multiple R-squared:  0.9295, Adjusted R-squared:  0.9288 
## F-statistic:  1297 on 3 and 295 DF,  p-value: < 2.2e-16

Review: How did we make predictions so far?

# 1. Get the coefficients.
intercept <- coef(reg)[1]
slopes <- coef(reg)[2:4]

# 2. Choose interesting covariates' values.
x_east0 <- 0
x_east1 <- 1
x_unemp <- seq(0, 20, 0.1)

# 3.1. Write your predict function.
predict_mu <- function(intercept, slopes, x1, x2) {
  intercept + slopes[1] * x1 + slopes[2] * x2 + slopes[3] * x1 * x2
}

#3.2. Let the function do the work for you.
pred_leftvote_west <-
  predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east0)
pred_leftvote_east <-
  predict_mu(intercept, slopes, x1 = x_unemp, x2 = x_east1)

Base R

# 4. Plot it.
# 4.1. Plot the observations.
plot(
  x_unemp,
  pred_leftvote_west,
  type = "n",
  bty = "n",
  las = 1,
  lwd = 2,
  ylim = c(0, 30),
  ylab = "Predicted Voteshare (Die Linke) in %",
  xlab = "Unemployment in %",
  main = "Left Voteshare and Unemployment"
)

points(df$unemployment,
       df$leftvote ,
       pch = 19,
       col = ifelse(df$east == 1, 
                    viridis(2, alpha = 0.3, end = 0.5)[1], 
                    viridis(2, alpha = 0.3, end = 0.5)[2])
       )

# 4.2. Add the lines on top.

lines(
  x = x_unemp,
  y = pred_leftvote_west,
  lwd = 2,
  col = viridis(2, end = 0.5)[2]
)
lines(
  x = x_unemp,
  y = pred_leftvote_east,
  lwd = 2,
  col = viridis(2, end = 0.5)[1]
)

# What is missing?

ggplot2

ggplot(
  data = df,
  aes(x = unemployment, y = leftvote, group = as.character(east))
) +
  geom_point(
    aes(color = east, alpha = 0.5)
  ) +
  geom_line(mapping = aes(y = predict(reg), 
                          color = east)) +
  theme_classic() +
  labs(
    x = "Unemployment in %",
    y = "Predicted Voteshare (Die Linke) in %",
    color = "",
    title = "Left Voteshare and Unemployment"
  ) +
  theme(legend.position = "none")

2 Meet the apply function

Before we start with our simulations - meet the apply function.

Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

vars <- df[, 2:4]

vars
##     leftvote unemployment east
## 1       5.70          8.6    0
## 2       4.40          7.5    0
## 3       5.00          6.2    0
## 4       4.40          5.4    0
## 5       6.90          9.3    0
## 6       4.80          7.5    0
## 7       5.00          5.2    0
## 8       5.00          4.6    0
## 9       4.40          7.0    0
## 10      4.90          5.2    0
## 11      6.50          9.3    0
## 12     20.40          9.9    1
## 13     21.30          9.5    1
## 14     23.70         11.1    1
## 15     20.60         14.9    1
## 16     21.60         14.0    1
## 17     21.50         12.1    1
## 18     11.00          7.1    0
## 19     10.70          7.1    0
## 20      8.60          7.1    0
## 21      6.50          7.1    0
## 22      7.60          7.1    0
## 23      8.40          7.1    0
## 24      4.96          8.6    0
## 25      4.10          5.1    0
## 26      5.10          8.9    0
## 27      6.50          6.4    0
## 28      5.50          6.8    0
## 29      4.60          6.2    0
## 30      4.20          5.5    0
## 31      3.00          4.1    0
## 32      2.80          4.6    0
## 33      4.50          4.7    0
## 34      5.60          4.9    0
## 35      4.60          5.9    0
## 36      4.40          4.7    0
## 37      6.90          7.1    0
## 38      4.00          3.7    0
## 39      5.10          6.2    0
## 40      4.30          6.5    0
## 41      6.20          8.0    0
## 42      8.00          8.0    0
## 43      4.50          8.0    0
## 44      4.50          7.3    0
## 45      4.70          5.2    0
## 46      5.30          7.9    0
## 47      4.80          8.0    0
## 48      4.98          7.6    0
## 49      5.60          7.6    0
## 50      6.70          7.1    0
## 51      4.90          5.8    0
## 52      5.10          7.9    0
## 53      6.30          6.2    0
## 54     10.10         10.2    0
## 55     10.00         11.6    0
## 56     22.50         11.0    1
## 57     23.80         13.8    1
## 58     18.40          8.7    1
## 59     26.30          9.8    1
## 60     22.90          9.4    1
## 61     20.70          7.1    1
## 62     21.80          7.6    1
## 63     24.70          9.9    1
## 64     22.60         11.0    1
## 65     21.70         13.3    1
## 66     24.60         12.2    1
## 67     21.70          9.1    1
## 68     22.90          9.7    1
## 69     24.00         11.3    1
## 70     22.80         11.1    1
## 71     25.60         11.9    1
## 72     24.40         11.4    1
## 73     23.70         12.1    1
## 74     25.70         12.0    1
## 75     18.70         11.6    0
## 76     25.20         11.6    1
## 77      8.00         11.6    0
## 78      9.50         11.6    0
## 79      7.20         11.6    0
## 80      8.90         11.6    0
## 81     10.30         11.6    0
## 82     14.30         11.6    1
## 83     25.10         11.6    1
## 84     29.50         11.6    1
## 85     32.90         11.6    1
## 86     34.60         11.6    1
## 87      7.70          8.4    0
## 88      6.50          8.4    0
## 89      5.20          7.0    0
## 90      5.70          7.8    0
## 91      5.20          7.4    0
## 92      5.20          6.6    0
## 93      8.10          9.0    0
## 94      6.90          9.0    0
## 95      9.20          9.0    0
## 96      6.40          6.7    0
## 97      5.60          5.5    0
## 98      4.60          5.5    0
## 99      5.40          6.0    0
## 100     5.10          6.2    0
## 101     7.20          8.4    0
## 102     8.70         11.9    0
## 103     6.80          9.4    0
## 104     5.10          6.8    0
## 105     5.50          6.8    0
## 106     6.40          8.4    0
## 107     7.80          8.4    0
## 108     4.97          5.9    0
## 109     6.40         10.6    0
## 110     5.20          7.8    0
## 111     5.40          7.0    0
## 112     4.60          6.2    0
## 113     6.00          6.7    0
## 114     6.60          8.4    0
## 115     7.90         12.3    0
## 116     8.80         12.3    0
## 117     8.00         10.5    0
## 118     6.40          9.2    0
## 119     8.10         12.3    0
## 120     6.60         12.3    0
## 121     6.80         10.7    0
## 122     6.30         10.7    0
## 123     7.60         13.5    0
## 124     4.40          4.4    0
## 125     6.20          9.8    0
## 126     3.60          4.1    0
## 127     4.20          3.2    0
## 128     4.70          4.5    0
## 129     6.30          5.9    0
## 130     4.40          5.7    0
## 131     4.98          4.9    0
## 132     8.40          8.9    0
## 133     5.80          5.9    0
## 134     4.90          5.6    0
## 135     5.10          7.6    0
## 136     5.10          6.1    0
## 137     5.20          6.0    0
## 138     6.70          9.1    0
## 139     6.40          7.2    0
## 140     7.90          9.6    0
## 141     8.00         11.8    0
## 142     7.90         12.6    0
## 143     7.80         12.6    0
## 144     6.20          9.0    0
## 145     6.30          9.9    0
## 146     5.20          6.1    0
## 147     4.70          5.1    0
## 148     5.80          5.4    0
## 149     4.80          5.7    0
## 150     5.90          6.9    0
## 151    20.60         10.3    1
## 152    21.30         10.8    1
## 153    22.60         10.8    1
## 154    19.90          9.2    1
## 155    18.70          9.0    1
## 156    19.90          9.4    1
## 157    19.20         12.4    1
## 158    17.10          8.8    1
## 159    19.00          8.8    1
## 160    18.10          8.9    1
## 161    20.60          8.6    1
## 162    23.00         10.1    1
## 163    20.00          8.6    1
## 164    19.40          8.8    1
## 165    21.30          8.3    1
## 166    20.20          8.4    1
## 167     5.50          4.8    0
## 168     8.70          8.1    0
## 169     6.00          5.3    0
## 170     5.40          4.8    0
## 171     6.80          4.4    0
## 172     5.20          5.9    0
## 173     6.50          6.3    0
## 174     4.60          3.6    0
## 175     5.50          4.9    0
## 176     4.60          4.3    0
## 177     5.20          4.9    0
## 178     4.50          4.7    0
## 179     5.90          7.2    0
## 180     6.10          4.9    0
## 181     4.20          4.2    0
## 182     8.90          7.2    0
## 183     8.10          7.2    0
## 184     6.20          5.8    0
## 185     6.60          6.9    0
## 186     6.70          5.2    0
## 187     5.50          5.1    0
## 188     4.90          4.6    0
## 189    19.90          8.6    1
## 190    22.50          7.7    1
## 191    24.50          9.5    1
## 192    22.10          8.0    1
## 193    23.00          8.8    1
## 194    25.60          8.4    1
## 195    23.00         10.1    1
## 196    24.80          6.9    1
## 197    25.00          6.2    1
## 198     5.20          5.5    0
## 199     4.70          4.4    0
## 200     5.20          5.2    0
## 201     4.80          4.2    0
## 202     5.80          6.6    0
## 203     4.40          3.7    0
## 204     6.30          4.2    0
## 205     5.10          3.7    0
## 206     5.50          5.0    0
## 207     5.10          5.4    0
## 208     5.60          6.5    0
## 209     4.80          4.5    0
## 210     7.60          7.0    0
## 211     6.50          6.5    0
## 212     4.80          4.2    0
## 213     2.90          3.7    0
## 214     2.90          2.0    0
## 215     3.30          2.2    0
## 216     3.00          2.7    0
## 217     3.60          2.2    0
## 218     4.50          4.9    0
## 219     4.20          4.9    0
## 220     4.80          4.9    0
## 221     4.80          4.9    0
## 222     2.80          3.0    0
## 223     2.90          3.1    0
## 224     2.60          2.8    0
## 225     3.00          3.4    0
## 226     3.10          3.1    0
## 227     3.30          3.4    0
## 228     3.00          3.0    0
## 229     3.80          4.3    0
## 230     2.70          2.9    0
## 231     3.00          3.9    0
## 232     3.40          3.1    0
## 233     3.90          3.1    0
## 234     3.20          3.1    0
## 235     3.40          4.9    0
## 236     4.10          3.6    0
## 237     3.30          4.3    0
## 238     4.00          4.4    0
## 239     4.80          5.0    0
## 240     3.60          3.7    0
## 241     3.90          3.0    0
## 242     4.40          3.1    0
## 243     4.98          4.1    0
## 244     7.10          7.6    0
## 245     5.80          7.1    0
## 246     3.80          2.8    0
## 247     3.60          3.7    0
## 248     4.50          3.6    0
## 249     3.40          2.9    0
## 250     4.96          3.5    0
## 251     4.00          3.5    0
## 252     5.60          5.6    0
## 253     3.20          2.5    0
## 254     3.30          2.2    0
## 255     3.70          3.1    0
## 256     3.60          3.7    0
## 257     3.20          3.1    0
## 258     6.20          5.9    0
## 259     6.70          5.9    0
## 260     4.40          3.5    0
## 261     4.60          3.4    0
## 262     4.10          3.4    0
## 263     4.20          4.1    0
## 264     4.50          3.8    0
## 265     4.70          3.8    0
## 266     4.10          3.7    0
## 267     4.70          4.5    0
## 268     4.80          3.2    0
## 269     4.20          3.6    0
## 270     4.70          4.2    0
## 271     6.00          5.4    0
## 272     4.10          3.1    0
## 273     4.10          3.7    0
## 274     5.70          4.5    0
## 275     7.50          5.9    0
## 276     4.20          3.5    0
## 277     4.80          4.1    0
## 278     4.40          3.5    0
## 279     4.60          4.3    0
## 280     4.20          3.9    0
## 281     7.90          5.1    0
## 282     4.80          3.5    0
## 283     4.60          3.3    0
## 284     4.70          3.7    0
## 285     4.00          3.1    0
## 286     4.10          3.8    0
## 287     5.00          4.3    0
## 288     4.30          3.1    0
## 289     4.80          3.8    0
## 290     6.60          3.4    0
## 291     4.30          3.8    0
## 292     3.40          2.7    0
## 293     4.40          3.2    0
## 294     4.30          2.8    0
## 295     4.10          4.0    0
## 296    11.70          9.0    0
## 297     9.00          5.6    0
## 298     9.00          5.8    0
## 299    10.20          6.7    0

What are margins?

Type 1 for rows, 2 for columns, (1:2) for both

General usage: apply(X, MARGIN, FUN, ...)

Let’s start with column means.

means <- apply(vars, 2,  mean)

means
##     leftvote unemployment         east 
##     8.730870     6.768896     0.187291

Use apply to plot…

apply(vars, 2, hist, col = viridis(1), border = "white", las = 1)

## $leftvote
## $breaks
## [1]  0  5 10 15 20 25 30 35
## 
## $counts
## [1] 120 116   7  12  35   7   2
## 
## $density
## [1] 0.080267559 0.077591973 0.004682274 0.008026756 0.023411371 0.004682274
## [7] 0.001337793
## 
## $mids
## [1]  2.5  7.5 12.5 17.5 22.5 27.5 32.5
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
## 
## $unemployment
## $breaks
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15
## 
## $counts
##  [1] 15 48 43 36 29 33 29 18 12 21 10  4  1
## 
## $density
##  [1] 0.050167224 0.160535117 0.143812709 0.120401338 0.096989967 0.110367893
##  [7] 0.096989967 0.060200669 0.040133779 0.070234114 0.033444816 0.013377926
## [13] 0.003344482
## 
## $mids
##  [1]  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5 13.5 14.5
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
## 
## $east
## $breaks
##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
## 
## $counts
##  [1] 243   0   0   0   0   0   0   0   0  56
## 
## $density
##  [1] 8.12709 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## [10] 1.87291
## 
## $mids
##  [1] 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
## 
## $xname
## [1] "newX[, i]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
# How about row sums?

sums <- apply(vars, 1, sum)

sums
##   [1] 14.30 11.90 11.20  9.80 16.20 12.30 10.20  9.60 11.40 10.10 15.80 31.30
##  [13] 31.80 35.80 36.50 36.60 34.60 18.10 17.80 15.70 13.60 14.70 15.50 13.56
##  [25]  9.20 14.00 12.90 12.30 10.80  9.70  7.10  7.40  9.20 10.50 10.50  9.10
##  [37] 14.00  7.70 11.30 10.80 14.20 16.00 12.50 11.80  9.90 13.20 12.80 12.58
##  [49] 13.20 13.80 10.70 13.00 12.50 20.30 21.60 34.50 38.60 28.10 37.10 33.30
##  [61] 28.80 30.40 35.60 34.60 36.00 37.80 31.80 33.60 36.30 34.90 38.50 36.80
##  [73] 36.80 38.70 30.30 37.80 19.60 21.10 18.80 20.50 21.90 26.90 37.70 42.10
##  [85] 45.50 47.20 16.10 14.90 12.20 13.50 12.60 11.80 17.10 15.90 18.20 13.10
##  [97] 11.10 10.10 11.40 11.30 15.60 20.60 16.20 11.90 12.30 14.80 16.20 10.87
## [109] 17.00 13.00 12.40 10.80 12.70 15.00 20.20 21.10 18.50 15.60 20.40 18.90
## [121] 17.50 17.00 21.10  8.80 16.00  7.70  7.40  9.20 12.20 10.10  9.88 17.30
## [133] 11.70 10.50 12.70 11.20 11.20 15.80 13.60 17.50 19.80 20.50 20.40 15.20
## [145] 16.20 11.30  9.80 11.20 10.50 12.80 31.90 33.10 34.40 30.10 28.70 30.30
## [157] 32.60 26.90 28.80 28.00 30.20 34.10 29.60 29.20 30.60 29.60 10.30 16.80
## [169] 11.30 10.20 11.20 11.10 12.80  8.20 10.40  8.90 10.10  9.20 13.10 11.00
## [181]  8.40 16.10 15.30 12.00 13.50 11.90 10.60  9.50 29.50 31.20 35.00 31.10
## [193] 32.80 35.00 34.10 32.70 32.20 10.70  9.10 10.40  9.00 12.40  8.10 10.50
## [205]  8.80 10.50 10.50 12.10  9.30 14.60 13.00  9.00  6.60  4.90  5.50  5.70
## [217]  5.80  9.40  9.10  9.70  9.70  5.80  6.00  5.40  6.40  6.20  6.70  6.00
## [229]  8.10  5.60  6.90  6.50  7.00  6.30  8.30  7.70  7.60  8.40  9.80  7.30
## [241]  6.90  7.50  9.08 14.70 12.90  6.60  7.30  8.10  6.30  8.46  7.50 11.20
## [253]  5.70  5.50  6.80  7.30  6.30 12.10 12.60  7.90  8.00  7.50  8.30  8.30
## [265]  8.50  7.80  9.20  8.00  7.80  8.90 11.40  7.20  7.80 10.20 13.40  7.70
## [277]  8.90  7.90  8.90  8.10 13.00  8.30  7.90  8.40  7.10  7.90  9.30  7.40
## [289]  8.60 10.00  8.10  6.10  7.60  7.10  8.10 20.70 14.60 14.80 16.90
# Combining functions in a function.

multi_fun <- function(x) {
  c(min = min(x),
    mean = mean(x),
    max = max(x))
}

# And then use the multi_fun with apply.

apply(vars, 2, multi_fun)
##      leftvote unemployment     east
## min   2.60000     2.000000 0.000000
## mean  8.73087     6.768896 0.187291
## max  34.60000    14.900000 1.000000

2.1 Working with the apply function

  1. We want to quickly calculate the 2.5%, 50% and 97.5% quantiles of the variables in the data set using the apply function. Just as with the plots above, we can specify additional arguments to the function call in the apply function. Here: probs = c(0.025, 0.5, 0.975).

  2. Now combine some functions to get the 2.5 % and 97.5 % quantiles as well as the mean. We call this function quants_mean_fun

quants <- apply(vars, 2, quantile, probs = c(0.025, 0.5, 0.975))

quants
##       leftvote unemployment east
## 2.5%     3.000        2.800    0
## 50%      5.600        6.200    0
## 97.5%   25.155       12.355    1
# Now combine some functions to get the 2.5 % and 97.5 % quantiles
# as well as the mean.

quants_mean_fun <-  function(x) {
  c(quants = quantile(x, probs = c(0.025, 0.975)),
    mean = mean(x))
}

quants_mean <- apply(vars, 2, quants_mean_fun)
quants_mean
##              leftvote unemployment     east
## quants.2.5%   3.00000     2.800000 0.000000
## quants.97.5% 25.15500    12.355000 1.000000
## mean          8.73087     6.768896 0.187291

Become familiar with apply and your R code will be short and fast!

3 Simulation of Expected Values

Now we get started with simulations. I hope you all read the King, Tomz, and Wittenberg (2000) article. And you remember the five steps of simulation from the lecture.

Our first goal is to get so-called expected values, \(E(Y|X)\). Expected values are the average (expected) value of a variable \(Y\), conditional on a particular set of \(X\) values. For example, we could be interested in the expected vote share of the Party Die Linke in a West German district with an unemployment rate of \(6.7\%\) (this amounts to the nationwide average unemployment rate). In mathematical terms, this would be \(E(\text{Leftvote} | \text{West}, \text{Unempl.} = 0.067)\).

Let’s do this:

Step 1 - Get the regression coefficients.

beta_hat <- coef(reg)

Step 2 - Generate sampling distribution.

Step 2.1. Get the variance-covariance matrix.

V_hat <- vcov(reg) 

# What are the diagonal elements?

sqrt(diag(V_hat))
##       (Intercept)      unemployment              east unemployment:east 
##        0.30817130        0.04739161        1.43806978        0.14407205

Step 2.2. Draw from the multivariate normal distribution.

# We need the MASS package

library(MASS)

# Set the number of draws/simulations.

nsim <- 1000 

# Draw from the multivariate normal distribution to get S.

S <- mvrnorm(nsim, beta_hat, V_hat)

dim(S) # Check dimensions
## [1] 1000    4
# We now can use S to get both expected and predicted values.

Step 3 - Choose interesting covariate values. Also known as: Set a scenario.

E.g., difference in voteshare Die Linke for East and West.

Tip: double-check the ordering of coefficients first, to put the values in the correct order.

names(beta_hat)
## [1] "(Intercept)"       "unemployment"      "east"             
## [4] "unemployment:east"
X_east <- c(1, mean(df$unemployment), 1, mean(df$unemployment) * 1) # East
X_west <- c(1, mean(df$unemployment), 0, mean(df$unemployment) * 0) # West

Step 4 - Calculate Quantities of Interest

Expected Values (E(Y|X))

EV_east <- S %*% as.matrix(X_east)  

# %*% is the operator for matrix multiplication

EV_west <- S %*% as.matrix(X_west)

# Even quicker: we put the scenarios in a matrix.

X <- as.matrix(rbind(X_east, X_west))

EV_combined <- S %*% t(X)

First Differences

A first difference is the difference between two expected values:

\[ \underbrace{FD}_{\text{First Difference}} = \underbrace{E(Y | X_{1})}_{\text{Expected Value of first scenario}} - \underbrace{E(Y | X_{2})}_{\text{Expected Value of second scenario}} \]

fd <- EV_combined[,1] - EV_combined[,2]

Step 5 - Summarize Results

Plot the expected values for west.

Base R

par(mfrow = c(1,2))
hist(
  EV_combined[, 2],
  las = 1,
  col = viridis(4)[1],
  border = "white",
  main = "",
  xlab = "Expected Values for the voteshare of Die Linke for districts in the west
     (With unemployment at its mean.)",
)

# Get mean and quantiles. We use our quants_mean_fun from above.

quants_combined <- apply(EV_combined, 2, quants_mean_fun)

# Add the lines to the plot.

abline(v = c(quants_combined[, 2]),
       lty = 2,
       col = viridis(4)[4])

# Of course we can do the same for east.

hist(
  EV_combined[, 1],
  main = "",
  las = 1,
  col = viridis(4)[2],
  border = "white",
  xlab = "Expected Values for the voteshare of Die Linke for East
     (With unemployment at its mean.)",
)
abline(v = c(quants_combined[, 1]),
       lty = 2,
       col = viridis(4)[4])

# Similarly, we can plot the distribution of the First Differences
dev.off()
## null device 
##           1
hist(
  fd,
  main = "",
  las = 1,
  col = viridis(4)[3],
  border = "white",
  xlab = "First Differences for the voteshare of Die Linke between East and West
     (With unemployment at its mean.)"
)

# Get mean amd quantiles.

quants_fd <- apply(as.matrix(fd), 2, quants_mean_fun)

# Add the lines to the plot

abline(v = quants_fd, lty = 2, col = viridis(4)[4])

ggplot2

# Expected Values, West
ggplot() +
  geom_histogram(aes(x = EV_combined[,2]),
    boundary = 5.4,
    binwidth = 0.1,
    color = "white",
    fill = viridis(4)[1]
  ) +
  labs(
    x = "Expected Values for the voteshare of Die Linke for districts in the west
     (With unemployment at its mean.)",
    y = "Frequency"
  ) +
  geom_vline(xintercept = c(quants_combined[, 2]),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic() +
  scale_x_continuous(breaks = c(seq(5.4, 6.4, by = 0.2)))

# Expected values, East
ggplot() +
  geom_histogram(aes(x = EV_combined[, 1]),
    boundary = 19,
    binwidth = 0.5,
    color = "white",
    fill = viridis(4)[2]
  ) +
  labs(
    x = "Expected Values for the voteshare of Die Linke for districts in the east
     (With unemployment at its mean.)",
    y = "Frequency"
  ) +
  geom_vline(xintercept = c(quants_combined[, 1]),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic() +
  scale_x_continuous(breaks = c(seq(19, 23, by = 1)))

# First Difference
ggplot() +
  geom_histogram(aes(x = fd),
    boundary = 19,
    binwidth = 0.5,
    color = "white",
    fill = viridis(4)[3]
  ) +
  labs(
    x = "EFirst Differences for the voteshare of Die Linke between East and West
     (With unemployment at its mean.)",
    y = "Frequency"
  ) +
  geom_vline(xintercept = c(quants_fd),
             color = viridis(4)[4],
             linetype = "dashed") +
  theme_classic() +
  scale_x_continuous(breaks = c(seq(13, 27, by = 1)))

So far the scenarios were rather boring. Let’s do something more exciting! We calculate expected values over a range of unemployment.

We go back to Step 3.

Step 3 - Choose covariate values.

This time we choose a range of covariate values.

unemp <- seq(0, 20, 0.1)  # Range from 0 to 20, in steps of 0.1
scenario_east <- cbind(1, unemp, 1, unemp * 1) 
scenario_west <- cbind(1, unemp, 0, unemp * 0)

Step 4 - Calculate Quantities of Interest

EV_range_east <- S %*% t(scenario_east)
EV_range_west <- S %*% t(scenario_west)

dim(EV_range_west)
## [1] 1000  201
# Quantiles, we again use apply and our quants.mean.fun
quants_range_east <- apply(EV_range_east, 2, quants_mean_fun)
quants_range_west <- apply(EV_range_west, 2, quants_mean_fun)

Step 5 - Summarize Results

Base R

# Plot
plot(
  unemp,
  quants_range_east[2,],
  pch = 19,
  cex = 0.3,
  bty = "n",
  las = 1,
  ylim = c(0, 35),
  ylab = "Voteshare (%)",
  main = "Expected Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

# Let's add our actual observations.

points(df$unemployment,
       df$leftvote,
       pch = 19,
       col = adjustcolor(viridis(3)[1], alpha = 0.5))

# Now we add the lines.
lines(unemp, quants_range_east[3,], 
      col = viridis(3)[2])

lines(unemp, quants_range_west[3,],
      col = viridis(3)[3])

# Let's add those confidence intervals.

# First, for east:
lines(unemp, quants_range_east[1,], lty = "dashed", col = viridis(3)[2])
lines(unemp, quants_range_east[2,], lty = "dashed", col = viridis(3)[2])

# And for west:

lines(unemp, quants_range_west[1,], lty = "dashed", col = viridis(3)[3])
lines(unemp, quants_range_west[2,], lty = "dashed", col = viridis(3)[3])

# Add a legend

legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

There are many different ways to plot confidence intervals. Pick the style you like most. E.g., we can use polygons to plot the confidence intervals:

plot(
  unemp,
  quants_range_east[2,],
  pch = 19,
  cex = 0.3,
  bty = "n",
  las = 1,
  ylim = c(0, 35),
  ylab = "Voteshare (%)",
  main = "Expected Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

points(df$unemployment,
       df$leftvote,
       pch = 19,
       col = adjustcolor(viridis(3)[1], alpha = 0.8))

polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_east[1 ,], rev(quants_range_east[2 ,])),
  col = adjustcolor(viridis(3)[2], alpha = 0.5),
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_west[1 ,], rev(quants_range_west[2 ,])),
  col = adjustcolor(viridis(3)[3], alpha = 0.5),
  border = NA
)


lines(unemp, quants_range_east[3,], col = viridis(3)[2])  # In this case I usually plot the polygons first and then the lines.

lines(unemp, quants_range_west[3,],
      col = viridis(3)[3])

# Add a legend

legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

ggplot2

# data frame for EV east
plot_ev_west <- data.frame(t(quants_range_west))
names(plot_ev_west) <- c("ci_lo", "ci_hi", "mean")
plot_ev_west$unemp <- unemp
plot_ev_west$east <- 0

# data frame for EV east
plot_ev_east <- data.frame(t(quants_range_east))
names(plot_ev_east) <- c("ci_lo", "ci_hi", "mean")
plot_ev_east$unemp <- unemp
plot_ev_east$east <- 1

# combine data frames
plot_ev <- rbind(plot_ev_west, plot_ev_east)

# plot
ggplot(
  data = df,
  aes(x = unemployment, y = leftvote,
      group = east)
) +
  geom_point(
    color = viridis(2, 0.5)[1]
  ) +
  # add mean expected values
  geom_line(data = plot_ev, aes(x = unemp, 
                                y = mean,
                                group = east)) +
  # add confidence intervals
  geom_line(data = plot_ev, aes(x = unemp, 
                                y = ci_lo,
                                group = east),
            linetype = "dashed") +
  geom_line(data = plot_ev, aes(x = unemp, 
                                y = ci_hi,
                                group = east),
            linetype = "dashed") +
  theme_classic() +
  labs(
    x = "Unemployment in %",
    y = "Predicted Voteshare (Die Linke) in %",
    color = "",
    title = "Left Voteshare and Unemployment"
  ) +
  theme(legend.position = "none")

4 Exercise Session

Today’s exercise session is not about the generation of new code, but about better understanding what we did so far. Together with your neighbour, try to answer the following questions by going through and playing around with the code that we have written in the previous lines.

  1. What does mvrnorm do?
  2. What are the dimensions of S? Why?
  3. What do you need to specify in a scenario vector/interesting covariate values? (E.g. scenario_east)
  4. Is the order of the elements in the scenario vector important?
  5. What would we need to do to get predicted values instead of expected values?

5 Simulation of Predicted Values Y|X

Now we also want to simulate predicted values. What’s different from expected values?

  • Step 1 - Get the regression coefficients.
    • Exactly the same as above.
  • Step 2 - Generate sampling distribution.
    • Exactly the same as above.
  • Step 3 - Choose covariate values.
    • Exactly the same as above.
X_east <- c(1, mean(df$unemployment), 1, mean(df$unemployment) * 1) # East
X_west <- c(1, mean(df$unemployment), 0, mean(df$unemployment) * 0) # West

X <- as.matrix(rbind(X_east, X_west))

Step 4 - Calculate Quantities of Interest: Predicted Values

This is still the same as above.

EV_combined <- S %*% t(X)

Now we need to add something. Remember sigma_est (i.e. \(\hat\sigma\)) from last lab/lecture? That’s the fundamental uncertainty!

# Y ~ N(mu_c, sigma_est)
sigma_est <- sqrt(sum(residuals(reg)^2) / (nrow(df) - length(beta_hat)))

Y_hat_east <- EV_combined[,1] + rnorm(nsim, 0, sigma_est)
Y_hat_west <- EV_combined[,2] + rnorm(nsim, 0, sigma_est)


# Quantiles
quants_east <- quants_mean_fun(Y_hat_east)
quants_west <- quants_mean_fun(Y_hat_west)

Let’s plot it:

# Histogram Predicted Values West Germany
hist(Y_hat_west,
     las = 1,
     main = "Histogram of Predicted Values (West Germany)",
     col = viridis(3)[1], 
     border = "white")
abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])

# Histogram Predicted Values East Germany
hist(Y_hat_east,
     las = 1,
     main = "Histogram of Predicted Values (East Germany)",
     col = viridis(3)[2], 
     border = "white")
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])

# We could put both distributions in one plot.
plot(density(Y_hat_west), 
     xlim = c(0,40), 
     lwd = 2 ,
     bty = "n", 
     yaxt = "n", 
     ylab = "", 
     xlab = "Left Voteshare in %",
     main = "Predicted Values for Voteshare",
     type = "n")
lines(density(Y_hat_west, from = min(Y_hat_west), to = max(Y_hat_west)), lwd = 2, col = viridis(3)[1])
lines(density(Y_hat_east, from = min(Y_hat_east), to = max(Y_hat_east)), lwd = 2, col = viridis(3)[2])
abline(v = c(quants_west[1:3]), lty = 2, col = viridis(3)[3])
abline(v = c(quants_east[1:3]), lty = 2, col = viridis(3)[3])

Let’s also do it for a range of unemployment.

Step 4 - Calculate Quantities of Interest over a range

EV_range_east <- S %*% t(scenario_east)
EV_range_west <- S %*% t(scenario_west)

Y_hat_range_east <- EV_range_east + rnorm(nsim, 0, sigma_est)
Y_hat_range_west <- EV_range_west + rnorm(nsim, 0, sigma_est)

# Quantiles, we again use apply and our quants_mean_fun
Y_quants_range_east <- apply(Y_hat_range_east, 2, quants_mean_fun)
Y_quants_range_west <- apply(Y_hat_range_west, 2, quants_mean_fun)

Plot it with polygons as confidence intervals.

plot(
  unemp,
  Y_quants_range_east[2, ],
  las = 1,
  bty = "n",
  pch = 19,
  cex = 0.3,
  ylim = c(0, 45),
  ylab = "Voteshare (%)",
  main = "Predicted Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

points(df$unemployment,
       df$leftvote ,
       pch = 19,
       col = adjustcolor(viridis(3)[1], alpha = 0.8))

polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_east[1 , ], rev(Y_quants_range_east[2 , ])),
  col = adjustcolor(viridis(3)[2], alpha = 0.5),
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_west[1 , ], rev(Y_quants_range_west[2 , ])),
  col = adjustcolor(viridis(3)[3], alpha = 0.5),
  border = NA
)


lines(unemp, Y_quants_range_east[3, ],
      col = viridis(3)[2])

lines(unemp, Y_quants_range_west[3, ],
      col = viridis(3)[3])

# Add a legend

legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

# data frame for EV east
plot_pv_west <- data.frame(t(Y_quants_range_west))
names(plot_pv_west) <- c("ci_lo", "ci_hi", "mean")
plot_pv_west$unemp <- unemp
plot_pv_west$east <- 0

# data frame for EV east
plot_pv_east <- data.frame(t(Y_quants_range_east))
names(plot_pv_east) <- c("ci_lo", "ci_hi", "mean")
plot_pv_east$unemp <- unemp
plot_pv_east$east <- 1

# combine data frames
plot_pv <- rbind(plot_pv_west, plot_pv_east)

# plot
ggplot(
  data = df,
  aes(x = unemployment, y = leftvote,
      group = east)
) +
  geom_point(
    color = viridis(2, 0.5)[1]
  ) +
  # add mean expected values
  geom_line(data = plot_pv, aes(x = unemp, 
                                y = mean,
                                group = east)) +
  # add confidence intervals
  geom_line(data = plot_pv, aes(x = unemp, 
                                y = ci_lo,
                                group = east),
            linetype = "dashed") +
  geom_line(data = plot_pv, aes(x = unemp, 
                                y = ci_hi,
                                group = east),
            linetype = "dashed") +
  theme_classic() +
  labs(
    x = "Unemployment in %",
    y = "Predicted Voteshare (Die Linke) in %",
    color = "",
    title = "Left Voteshare and Unemployment"
  ) +
  theme(legend.position = "none")

The confidence bounds are wider because we take the fundamental uncertainty of our model into account. To see this we can plot the expected values plot and predicted values plot side by side.

par(mfrow=c(1,2))

# Plot "Expected Voteshares" of Die Linke

plot(
  unemp,
  quants_range_east[2,],
  pch = 19,
  cex = 0.3,
  bty = "n",
  las = 1,
  ylim = c(0, 45),
  ylab = "Voteshare (%)",
  main = "Expected Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

points(df$unemployment,
       df$leftvote,
       pch = 19,
       col = adjustcolor(viridis(3)[1], alpha = 0.8))

polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_east[1 ,], rev(quants_range_east[2 ,])),
  col = adjustcolor(viridis(3)[2], alpha = 0.5),
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(quants_range_west[1 ,], rev(quants_range_west[2 ,])),
  col = adjustcolor(viridis(3)[3], alpha = 0.5),
  border = NA
)


lines(unemp, quants_range_east[3,], col = viridis(3)[2])  # In this case I usually plot the polygons first and then the lines.

lines(unemp, quants_range_west[3,],
      col = viridis(3)[3])

# Add a legend

legend("topleft",
       lty = "solid",
       col = viridis(3)[2:3],
       legend = c("East", "West"),
       cex = 0.8,
       bty = "n")

# Plot "Predicted Voteshares" of Die Linke

plot(
  unemp,
  Y_quants_range_east[2, ],
  las = 1,
  bty = "n",
  pch = 19,
  cex = 0.3,
  ylim = c(0, 45),
  ylab = "Voteshare (%)",
  main = "Predicted Voteshare (Die Linke)",
  xlab = "Range of Unemployment",
  type = "n"
)

points(df$unemployment,
       df$leftvote ,
       pch = 19,
       col = adjustcolor(viridis(3)[1], alpha = 0.8))

polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_east[1 , ], rev(Y_quants_range_east[2 , ])),
  col = adjustcolor(viridis(3)[2], alpha = 0.5),
  border = NA
)
polygon(
  x = c(unemp, rev(unemp)),
  y = c(Y_quants_range_west[1 , ], rev(Y_quants_range_west[2 , ])),
  col = adjustcolor(viridis(3)[3], alpha = 0.5),
  border = NA
)


lines(unemp, Y_quants_range_east[3, ],
      col = viridis(3)[2])

lines(unemp, Y_quants_range_west[3, ],
      col = viridis(3)[3])

Concluding Remarks

In your homework you will have a lot of fun with simulations.

References

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61.